#model merging11/09/2025
mmBERT Unveiled: 3T Tokens, 1,833 Languages, and a 2–4× Speed Boost for Multilingual Encoding
'mmBERT is an encoder pretrained on 3 trillion tokens across 1,833 languages that runs 2–4× faster than previous multilingual encoders and supports 8k token contexts. It combines annealed language learning, inverse masking, and model merging to boost performance on high-resource and low-resource benchmarks.'